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1 An efficient meta-lock for implementing ubiquitous synchronization 

Ole Agesen, David Detlefs, Alex Garthwaite, Ross Knippel, Y. S. Ramakrishna, Derek White 
October 1999 ACM SIGPLAN Notices , Proceedings of the 14th ACM SIGPLAN 

conference on Object-oriented programming, systems, languages, and 

applications, Volume 34 Issue 10 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: ^ pdf(2.00 MB) 



Programs written in concurrent object-oriented languages, especially ones that employ 
thread-safe reusable class libraries, can execute synchronization operations (lock, notify, 
etc.) at an amazing rate. Unless implemented with utmost care, synchronization can become 
a performance bottleneck. Furthermore, in languages where every object may have its own 
monitor, per-object space overhead must be minimized. To address these concerns, we 
have developed a meta-lock to mediate access to synchro ... 



Keywords: concurrent threads, object-oriented language implementation, synchronization 



2 The performance implications of thread management alternatives for shared-memory | 
multiprocessors 

T. E. Anderson, D. D. Lazowska, H. M. Levy 

April 1989 ACM SIG METRICS Performance Evaluation Review , Proceedings of the 1989 

ACM SIG METRICS international conference on Measurement and modeling 

of computer systems, volume 17 issue l 

i- .I * ,* i ui 0i cc fciiox Additional Information: full citation , abstract , references, ci tings , index 

Full text available: TOpdf(1.56 MB) ' * 

terms 

Threads ("lightweight" processes) have become a common element of new languages and 
operating systems. This paper examines the performance implications of several data 
structure and algorithm alternatives for thread management in shared-memory 
multiprocessors. Both experimental measurements and analytical model projections are 
presented. For applications with fine-grained parallelism, small differences in thread 
management are shown to have significant performance imp ... 

3 Vertical profiling: understanding the behavior of object-priented applications 
Matthias Hauswirth, Peter F. Sweeney, Amer Diwan, Michael Hind 

October 2004 ACM SIGPLAN Notices , Proceedings of the 19th annual ACM SIGPLAN 
Conference on Object-oriented programming, systems, languages, and 
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applications, Volume 39 Issue 10 
Full text available: ^[pcffM.16 MB) Additional Information: full citation , abstract , references, index terms 

Object-oriented programming languages provide a rich set of features that provide 
significant software engineering benefits. The increased productivity provided by these 
features comes at a justifiable cost in a more sophisticated runtime system whose 
responsibility is to implement these features efficiently. However, the virtualization 
introduced by this sophistication provides a significant challenge to understanding complete 
system performance, not found in traditionally compiled languages ... 

Keywords: hardware performance monitors, perturbation, software performance monitors, 
vertical profiling, whole-system analysis 
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Implementation of Argus 

B. Liskov, D. Curtis, P. Johnson, R. Scheifer 

November 1987 ACM SIGOPS Operating Systems Review , Proceedings of the eleventh 

ACM Symposium on Operating systems principles, Volume 21 issue 5 

1- 11* ^ -i ui « , MO/Uiim Additional Information: full citation , abstract , references , citings , index 

Full text available: Tq pdf(1.34 MB) 

fc^— 1 terms 

Argus is a programming language and system developed to support the construction and 
execution of distributed programs. This paper describes the implementation of Argus, with 
particular emphasis on the way we implement atomic actions, because this is where Argus 
differs most from other implemented systems. The paper also discusses the performance of 
Argus. The cost of actions is quite reasonable, indicating that action systems like Argus are 
practical. 

Locking effects in multiprocessor implementations of protocols 
Mats Bjorkman, Per Gunningberg 

October 1993 ACM SIGCOMM Computer Communication Review , Conference 
proceedings on Communications architectures, protocols and 

applications, Volume 23 Issue 4 
Full text available- I B pdlH.06 MB) Additional Information: full citation , attract, references , citings, index 
^ terms , review 

We investigate how to exploit shared memory multiprocessors for parallel protocol 
processing. We present a multiprocessor implementation of the x-kernel protocol 
environment from the University of Arizona. A " processor- per-message" paradigm is used to 
partition the work over processors. Locks are used to protect shared protocol state and data. 
Mutual exclusion by locking can be costly if the parallel protocol code frequently accesses 
shared state and data. This paper addresses the effect of lock ... 

Executing Java threads in parallel in a distributed-memorv environment 
Mark W. MacBeth, Keith A. McGuigan, Philip J. Hatcher 

November 1998 Proceedings of the 1998 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: f 3pdf(194.63 KB) Additional Information: full citation , attract, references , dtings, index 

terms 

We present the design and initial implementation of Hyperion, an environment for the high- 
performance execution of Java programs. Hyperion supports high performance by utilizing a 
Java-bytecode-to-C translator and by supporting parallel execution via the distribution of 
Java threads across the multiple processors of a cluster of Linux machines. The Hyperion 
run-time system implements the Java memory model using an efficient communication 
substrate previously developed for Linux and Fast Ethernet ... 
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Techniques for obtaining high performance in Java programs 
Iffat H. Kazi, Howard H. Chen, Berdenia Stanley, David J. Lilja 
September 2000 ACM Computing Surveys (CSUR), volume 32 issue 3 



Full text available: ^ pdf(816.13 KB) 



Additional Information: full citation, abstract , references , citings , index 
terms 



This survey describes research directions in techniques to improve the performance of 
programs written in the Java programming language. The standard technique for Java 
execution is interpretation, which provides for extensive portability of programs. A Java 
interpreter dynamically executes Java bytecodes, which comprise the instruction set of the 
Java Virtual Machine (JVM). Execution time performance of Java programs can be improved 
through compilation, possibly at the expense of portabili ... 

Keywords: Java, Java virtual machine, bytecode-to-source translators, direct compilers, 
dynamic compilation, interpreters, just-in-time compilers 



8 Experience Using Multiprocessor Systems — A Status Report 
Anita K. Jones, Peter Schwarz 

June 1980 ACM Computing Surveys (CSUR), volume 12 issue 2 

Full text available: ^ pdf(4.48 MB) Additional Information: full citation, references , citings, index terms 



9 Performance modeling of multiprocessor implementations of protocols 
Mats Bjorkman, Per Gunningberg 

June 1998 IE EE/ ACM Transactions on Networking (TON), volume 6 issue 3 

Full text available: jg|pdf(285.80 KB) Additional Information: full citation , references , citings , index terms , review 



Keywords: multiprocessor, parallel communication protocols, performance modeling, 
queueing network model 



10 Source-level global optimizations for fine-grain distributed shared memory systems 
R. Veldema, R. F. H. Hofman, R. A. F. Bhoedjang, C. J. H. Jacobs, H. E. Bal 
June 2001 ACM SIGPLAN Notices , Proceedings of the eighth ACM SIGPLAN symposium 
on Principles and practices of parallel programming, volume 36 issue 7 

Additional Information: full citation , abstract , references , citings , index 



Full text available: y.„ , 

terms 

This paper describes and evaluates the use of aggressive static analysis in jackal, a fine- 
grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source- 
level compiler rather than the binary rewriting techniques employed by most other fine- 
grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., 
access-check batching) more effective and enables two novel fine-grain DSM optimizations: 
object-graph aggregatio ... 

11 Migration: Luna: a flexible Java protection system 
Chris Hawblitzel, Thorsten von Eicken 

December 2002 ACM SIGOPS Operating Systems Review, Volume 36 issue si 
Full text available: ^ pdf(1.39 MB) Additional Information: full citation , abstract , references 

Extensible Java systems face a difficult trade-off between sharing and protection. On one 
hand, Java's ability to run different protection domains in a single virtual machine enables 
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domains to share data easily and communicate without address space switches. On the 
other hand, unrestricted sharing blurs the boundaries between protection domains, making 
it difficult to terminate domains and enforce restrictions on resource usage. Existing 
solutions to these problems restrict sharing in an ad-hoc ... 

12 Performance evaluation of the Orca shared-obiect system | 
Henri E. Bal, Raoul Bhoedjang, Rutger Hofman, Ceriel Jacobs, Koen Langendoen, Tim Ruhl, M. 
Frans Kaashoek 

February 1998 ACM Transactions on Computer Systems (TOCS), volume 16 issue i 

.1 * ^ , u, « . H7ft 00 !, QX Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(179.39 KB) ; 

terms , review 

Orca is a portable, object-based distributed shared memory (DSM) system. This article 
studies and evaluates the design choices made in the Orca system and compares Orca with 
other DSMs. The article gives a quantitative analysis of Orca's coherence protocol (based on 
write-updates with function shipping), the totally ordered group communication protocol, the 
strategy for object placement, and the all-software, user-space architecture. Performance 
measurements for 10 parallel applications ill ... 

Keywords: distributed shared memory, parallel processing, portability 



13 Scalable concurrent counting 

Maurice Herlihy, Beng-Hong Lim, Nir Shavit 

November 1995 ACM Transactions on Computer Systems (TOCS), volume 13 issue 4 

r- ii * ^ i ki a kad\ Additional Information: full citation , abstract , references , ci tings , index 

Full text available: Tll pdfM.27 MB) ; 

leH^— 1 terms, review 

The notion of counting is central to a number of basic multiprocessor coordination problems, 
such as dynamic load balancing, barrier synchronization, and concurrent data structure 
design. We investigate the scalability of a variety of counting techniques for large-scale 
multiprocessors. We compare counting techniques based on: (1) spin locks, (2) message 
passing, (3) distributed queues, (4) software combining trees, and (5) counting networks. 
Our comparison is based on a series of simple be ... 

Keywords: combining trees, counting networks 



14 Waiting algorithms for synchronization in large-scale multiprocessors 
Beng-Hong Lim, Anant Agarwal 

August 1993 ACM Transactions on Computer Systems (TOCS), Volume n issue 3 

i- i. * ^ ■ u. « ^/o7o.mm Additional Information: full citation, abstract , references , citings , index 

Full text available: pdf(2.72 MB) ; 

^ terms 

Through analysis and experiments, this paper investigates two-phase waiting algorithms to 
minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two- 
phase algorithm, a thread first waits by polling a synchronization variable. If the cost of 
polling reaches a limit Lpoll and further waiting is necessary, the thread is blocked, incurring 
an additional fixed cost, B. The choice of Lpoll 

Keywords: barriers, blocking, competitive analysis, locks, producer-consumer 
synchronization, spinning, waiting time 



15 Diffracting trees 

Nir Shavit, Asaph Zemach 

November 1996 ACM Transactions on Computer Systems (TOCS), volume 14 issue 4 
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Full text available: ^pdf(729.57 KB) Additional Information: full citation , abstract , references , citings , index 

terms 



Shared counters are among the most basic coordination structures in multiprocessor 
amputation, with applications ranging from barrier synchronization to concurrent-data- 
structure design. This article introduces diffracting trees, novel data structures for share 
counting and load balancing in a distributed/parallel environment. Empirical evidence, 
collected on a simulated distributed shared-memory machine and several simulated 
message-passing architectures, shows that diffracting trees seal ... 

Keywords: contention, counting networks, index distribution, lock free, wait free 



16 Multithreading and value prediction: Speculative lock elision: enabling highly 

concurrent multithreaded execution 
Ravi Rajwar, James R. Goodman 

December 2001 Proceedings of the 34th annual ACM/IEEE international symposium on 

M i croa rch i t ectu re 

Full text available: ^ „ [fjj] 

^jpdttl.^/ MB) ^ Additional Information: full citation , abstract , references , ci tings 

Publisher Site 

Serialization of threads due to critical sections is a fundamental bottleneck to achieving high 
performance in multithreaded programs. Dynamically, such serialization may be 
unnecessary because these critical sections could have safely executed concurrently without 
locks. Current processors cannot fully exploit such parallelism because they do not have 
mechanisms to dynamically detect such false inter-thread dependences. We propose 
Speculative Lock Elision (SLE), a novel micro-architectura ... 

17 The family of concurrent logic programming languages 
Ehud Shapiro 

September 1989 ACM Computing Surveys (CSUR), volume 21 issue 3 

Full text available- f B odf(9.62 MB) Additional Information: full citation , abstract, references , dtings, index 
^ terms 

Concurrent logic languages are high-level programming languages for parallel and 
distributed systems that offer a wide range of both known and novel concurrent 
programming techniques. Being logic programming languages, they preserve many 
advantages of the abstract logic programming model, including the logical reading of 
programs and computations, the convenience of representing data structures with logical 
terms and manipulating them using unification, and the amenability to metaprogrammin ... 

18 Concurrency control: methods, performance, and analysis 
Alexander Thomasian 

March 1998 ACM Computing Surveys (CSUR), volume 30 issue 1 

Full text available: ^£| pdf(427.18 KB) Additional Information: full citation , references , citings , index terms 



Keywords: Markov chains, adaptive methods, concurrency control, data contention, 
deadlocks, flow diagrams, load control, optimistic concurrency control, queueing network 
models, restart-oriented locking methods, serialiazability, thrashing, two-phase locking, 
two-phase processing, wait depth limited methods 



19 Mostly lock-free malloc 
Dave Dice, Alex Garthwaite 
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June 2002 ACM SIGPLAN Notices , Proceedings of the 3rd international symposium on 

Memory management, volume 38 issue 2 supplement 
Full text available: ■gpdf(609.93 KB) Additional Information: ^ctetion , attract, references , dtings. index 

Modern multithreaded applications, such as application servers and database engines, can 
severely stress the performance of user-level memory allocators like the ubiquitous malloc 
subsystem. Such allocators can prove to be a major scalability impediment for the 
applications that use them, particularly for applications with large numbers of threads 
running on high-order multiprocessor systems.This paper introduces Multi-Processor 
Restartable Critical Sections, or MP-RCS. MP-RCS permits user-level ... 

Keywords: affinity, locality, lock-free operations, malloc, restartable critical sections 



20 A language with distributed scope 
Luca Cardelli 

January 1995 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles 
of programming languages 

Additional Information: full citation , abstract , references , citings, index 



Full text available: ' , 

terms 

Obliq is a lexically-scoped, untyped, interpreted language that supports distributed object- 
oriented computation. Obliq objects have state and are local to a site. Obliq computations 
can roam over the network, while maintaining network connections. Distributed lexical 
scoping is the key mechanism for managing distributed computation. 
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